Search CORE

156 research outputs found

Report of MIRACLE team for the Ad-Hoc track in CLEF 2006

Author: González Cristóbal José Carlos
Goñi Menoyo José Miguel
Villena Román Julio
Publication venue: E.T.S.I. Telecomunicación (UPM)
Publication date: 01/01/2006
Field of study

This paper presents the 2006 MIRACLE’s team approach to the AdHoc Information Retrieval track. The experiments for this campaign keep on testing our IR approach. First, a baseline set of runs is obtained, including standard components: stemming, transforming, filtering, entities detection and extracting, and others. Then, a extended set of runs is obtained using several types of combinations of these baseline runs. The improvements introduced for this campaign have been a few ones: we have used an entity recognition and indexing prototype tool into our tokenizing scheme, and we have run more combining experiments for the robust multilingual case than in previous campaigns. However, no significative improvements have been achieved. For the this campaign, runs were submitted for the following languages and tracks: - Monolingual: Bulgarian, French, Hungarian, and Portuguese. - Bilingual: English to Bulgarian, French, Hungarian, and Portuguese; Spanish to French and Portuguese; and French to Portuguese. - Robust monolingual: German, English, Spanish, French, Italian, and Dutch. - Robust bilingual: English to German, Italian to Spanish, and French to Dutch. - Robust multilingual: English to robust monolingual languages. We still need to work harder to improve some aspects of our processing scheme, being the most important, to our knowledge, the entities recognition and normalization

Archivo Digital UPM

Report of MIRACLE team for Geographical IR in CLEF 2006

Author: González Cristóbal José Carlos
Goñi Menoyo José Miguel
Lana Serrano Sara
Publication venue: E.U.I.T. Telecomunicación (UPM)
Publication date: 01/01/2006
Field of study

The main objective of the designed experiments is testing the effects of geographical information retrieval from documents that contain geographical tags. In the designed experiments we try to isolate geographical retrieval from textual retrieval replacing all geo-entity textual references from topics with associated tags and splitting the retrieval process in two phases: textual retrieval from the textual part of the topic without geo-entity references and geographical retrieval from the tagged text generated by the topic tagger. Textual and geographical results are combined applying different techniques: union, intersection, difference, and external join based. Our geographic information retrieval system consists of a set of basics components organized in two categories: (i) linguistic tools oriented to textual analysis and retrieval and (ii) resources and tools oriented to geographical analysis. These tools are combined to carry out the different phases of the system: (i) documents and topics analysis, (ii) relevant documents retrieval and (iii) result combination. If we compare the results achieved to the last campaign’s results, we can assert that mean average precision gets worse when the textual geo-entity references are replaced with geographical tags. Part of this worsening is due to our experiments return cero pertinent documents if no documents satisfy de geographical sub-query. But if we only analyze the results of queries that satisfied both textual and geographical terms, we observe that the designed experiments recover pertinent documents quickly, improving R-Precision values. We conclude that the developed geographical information retrieval system is very sensible to textual georeference and therefore it is necessary to improve the name entity recognition module

Archivo Digital UPM

Miracle’s 2005 Approach to Cross-lingual Information Retrieval

Author: González Cristóbal José Carlos
Goñi Menoyo José Miguel
Villena Román Julio
Publication venue: E.T.S.I. Telecomunicación (UPM)
Publication date: 01/01/2005
Field of study

This paper presents the 2005 Miracle’s team approach to Bilingual and Multilingual Information Retrieval. In the multilingual track, we have concentrated our work on the merging process of the results of monolingual runs to get the multilingual overall result, relying on available translations. In the bilingual and multilingual tracks, we have used available translation resources, and in some cases we have using a combining approach

Archivo Digital UPM

Miracle’s 2005 Approach to Monolingual Information Retrieval

Author: González Cristóbal José Carlos
Goñi Menoyo José Miguel
Villena Román Julio
Publication venue: E.T.S.I. Telecomunicación (UPM)
Publication date: 01/01/2005
Field of study

This paper presents the 2005 Miracle’s team approach to Monolingual Information Retrieval. The goal for the experiments in this year was twofold: continue testing the effect of combination approaches on information retrieval tasks, and improving our basic processing and indexing tools, adapting them to new languages with strange encoding schemes. The starting point was a set of basic components: stemming, transforming, filtering, proper nouns extracting, paragraph extracting, and pseudo-relevance feedback. Some of these basic components were used in different combinations and order of application for document indexing and for query processing. Second order combinations were also tested, by averaging or selective combination of the documents retrieved by different approaches for a particular query

Archivo Digital UPM

Documento marco : Jornadas sobre Formación en Informática Superior para los noventa

Author: González Cristóbal José Carlos
Sáez Vacas Fernando
Publication venue: E.T.S.I. Telecomunicación (UPM)
Publication date: 01/01/1991
Field of study

Archivo Digital UPM

DAEDALUS at PAN 2014: Guessing tweet author's gender and age

Author: González Cristóbal José Carlos
Villena Román Julio
Publication venue: E.T.S.I. Telecomunicación (UPM)
Publication date: 01/01/2014
Field of study

This paper describes our participation at PAN 2014 author profiling task. Our idea was to define, develop and evaluate a simple machine learning classifier able to guess the gender and the age of a given user based on his/her texts, which could become part of the solution portfolio of the company. We were interested in finding not the best possible classifier that achieves the highest accuracy, but to find the optimum balance between performance and throughput using the most simple strategy and less dependent of external systems. Results show that our software using Naive Bayes Multinomial with a term vector model representation of the text is ranked quite well among the rest of participants in terms of accuracy

Archivo Digital UPM

Documento Marco (Jornadas de información, debate y prospectiva sobre problemas críticos, áreas prioritarias y enfoques de la formación del nivel superior en el campo de la informática)

Author: González Cristóbal José Carlos
Sáez Vacas Fernando
Publication venue: E.T.S.I. Telecomunicación (UPM)
Publication date: 01/01/1991
Field of study

Los centros universitarios en general, y aquellos en los que se imparten enseñanzas de informática en particular, se enfrentan en este momento a la imperiosa obligación de renovar sus planes de estudios. Lamentablemente, la Universidad suele afrontar esta tarea en solitario, al margen -cuando no a espaldas- del entorno profesional, económico y social. Por su parte, las empresas tienden a quejarse de que los titulados universitarios reciben una preparación poco adaptada a sus necesidades, pero rara vez asumen su papel y su responsabilidad en estos temas. Frente a esta situación de hecho, estas Jornadas pretenden involucrar a todos los sectores interesados. Partimos del supuesto (obvio, aunque poco practicado) de que la enseñanza concierne no sólo a la administración educativa, sino a todo el sistema social

Archivo Digital UPM

Las enseñanzas de sistemas en los currículos relacionados con la computación: una experiencia acumulada

Author: González Cristóbal José Carlos
Sáez Vacas Fernando
Publication venue: E.T.S.I. Telecomunicación (UPM)
Publication date: 02/11/1994
Field of study

Este texto propone la incorporación de las ideas de sistemas como elemento indispensable en los currículos universitarios relacionados con computación. Con este objetivo,se presenta la experiencia acumulada en la Escuela Técnica Superior de Ingenieros de Telecomunicación de Madrid desde 1978. Más concretamente, se muestran los objetivos,metodología y resultados obtenidos dentro de una asignatura denominada Ingeniería de Sistemas, integrada en el último curso de una especialización en Ingeniería Telemática (Informática y Comunicaciones)

Archivo Digital UPM

MIRACLE Retrieval Experiments with East Asian Languages

Author: González Cristóbal José Carlos
Goñi Menoyo José Miguel
Martínez Fernández José Luis
Villena Román Julio
Publication venue: E.T.S.I. Telecomunicación (UPM)
Publication date: 01/01/2005
Field of study

This paper describes the participation of MIRACLE in NTCIR 2005 CLIR task. Although our group has a strong background and long expertise in Computational Linguistics and Information Retrieval applied to European languages and using Latin and Cyrillic alphabets, this was our first attempt on East Asian languages. Our main goal was to study the particularities and distinctive characteristics of Japanese, Chinese and Korean, specially focusing on the similarities and differences with European languages, and carry out research on CLIR tasks which include those languages. The basic idea behind our participation in NTCIR is to test if the same familiar linguisticbased techniques may also applicable to East Asian languages, and study the necessary adaptations

Archivo Digital UPM

MIRACLE’s Naive Approach to Medical Images Annotation

Author: González Cristóbal José Carlos
Goñi Menoyo José Miguel
Martínez Fernández José Luis
Villena Román Julio
Publication venue: E.T.S.I. Telecomunicación (UPM)
Publication date: 01/01/2005
Field of study

One of the proposed tasks of the ImageCLEF 2005 campaign has been an Automatic Annotation Task. The objective is to provide the classification of a given set of 1,000 previously unseen medical (radiological) images according to 57 predefined categories covering different medical pathologies. 9,000 classified training images are given which can be used in any way to train a classifier. The Automatic Annotation task uses no textual information, but image-content information only. This paper describes our participation in the automatic annotation task of ImageCLEF 2005

Archivo Digital UPM